Feat/attafosu/sglang OpenAI api compatibility by attafosu · Pull Request #156 · mlcommons/endpoints

attafosu · 2026-03-09T20:15:44Z

What does this PR do?

Adds unified dataset preset featuring openai-compatible and native sglang api for cnn dailymail.

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

Signed-off-by: attafosu <thomas.atta-fosu@intel.com> Committer: attafosu <thomas.atta-fosu@intel.com>

github-actions · 2026-03-09T20:15:56Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist · 2026-03-09T20:15:59Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new feature that expands the dataset management capabilities by adding a llama3_8b_sglang preset for the CNN Dailymail dataset. This preset facilitates compatibility with both OpenAI and native SGLang APIs, streamlining the process of preparing data for inference. A key part of this enhancement is the modification of the Harmonize data transform, which now supports a "plain" tokenization mode, offering greater flexibility in prompt processing. Additionally, minor adjustments were made to OpenAI API type definitions to improve robustness.

Highlights

New Dataset Preset: Introduced a new llama3_8b_sglang preset for the CNN Dailymail dataset, supporting both OpenAI-compatible and native SGLang API usage.
Enhanced Harmonize Transform: Enhanced the Harmonize data transformation to include a "plain" mode, allowing for direct tokenization of raw prompts without conversation harmonization.
OpenAI API Type Updates: Updated OpenAI API type definitions to explicitly mark refusal, usage, and system_fingerprint fields as optional with default None values.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

src/inference_endpoint/dataset_manager/predefined/cnndailymail/presets.py
- Imported the Harmonize transform.
- Added a new function llama3_8b_sglang which defines a list of transforms for prompt formatting and plain tokenization using Harmonize.
src/inference_endpoint/dataset_manager/transforms.py
- Modified the Harmonize class constructor to accept a mode parameter ("harmony" or "plain").
- Added validation for the mode parameter.
- Updated the process_row method to conditionally tokenize prompts based on the mode and to guard against overwriting existing tokenized columns.
src/inference_endpoint/openai/types.py
- Set default None values for the refusal field in ChatCompletionResponseMessage.
- Set default None values for the usage and system_fingerprint fields in ChatCompletionResponse.

Activity

Pull request created by attafosu.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces a new llama3_8b_sglang preset for the CNN/DailyMail dataset, enhancing compatibility with the SGLang API. It also refines the Harmonize transform by adding a mode parameter and corresponding logic to handle both 'harmony' and 'plain' tokenization, which is crucial for flexible prompt processing. Additionally, optional fields in OpenAI API types (refusal, usage, system_fingerprint) are now explicitly set with None as a default, improving clarity and consistency. The changes are well-structured and contribute positively to the codebase.

gemini-code-assist · 2026-03-09T20:17:38Z

src/inference_endpoint/dataset_manager/predefined/cnndailymail/presets.py

+    stream: bool = True,
+    max_new_tokens: int = 128,
+    temperature: float = 0.0,
+    top_p: float = 1.0,
+    top_k: int = 1,


The parameters stream, temperature, top_p, and top_k are defined in the llama3_8b_sglang function signature but are not used within the function body. This indicates dead code, which can be misleading and suggests that these parameters might be intended for future use or were overlooked. If these parameters are not meant to be used, they should be removed. If they are intended to be used, their functionality should be implemented.

Copilot

Pull request overview

Adds an SGLang-focused CNN/DailyMail preset and improves OpenAI msgspec compatibility by making optional response fields truly optional during struct construction/decoding.

Changes:

Make msgspec OpenAI response fields (refusal, usage, system_fingerprint) default to None for better OpenAI-compat decoding/serialization.
Extend Harmonize with a mode option (harmony vs plain) and add a per-row guard to avoid overwriting pre-tokenized rows in fused row-processor pipelines.
Add a new llama3_8b_sglang CNN/DailyMail preset that formats prompts and produces input_tokens via Harmonize(mode="plain").

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
`src/inference_endpoint/openai/types.py`	Makes msgspec OpenAI response structs tolerant of omitted optional fields.
`src/inference_endpoint/dataset_manager/transforms.py`	Adds `Harmonize` plain-tokenization mode and overwrite-prevention in fused pipelines.
`src/inference_endpoint/dataset_manager/predefined/cnndailymail/presets.py`	Adds an SGLang-oriented Llama 3.1 8B preset producing `input_tokens`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-09T20:21:23Z

src/inference_endpoint/dataset_manager/transforms.py

+        if self.tokenized_column in row and row[self.tokenized_column] is not None:
+            return row


The pre-tokenized guard treats any non-None value as “already tokenized”. In pandas rows, missing values are often NaN (which is not None), so this would incorrectly skip tokenization and leave NaN in input_tokens, likely breaking downstream code that expects a list of token IDs. Consider using an explicit null check that treats NaN as missing (e.g., via pd.isna) before returning early.

Copilot · 2026-03-09T20:21:23Z

src/inference_endpoint/dataset_manager/transforms.py

+        self.mode = mode
+        if self.mode not in {"harmony", "plain"}:
+            raise ValueError(f"Invalid harmonize mode: {self.mode}")


Harmonize.__call__ still skips purely based on the presence of tokenized_column in df.columns, but process_row now skips only when the per-row value is non-null. This makes behavior differ depending on whether row processors are fused (or if fuse_row_processors=False is used). Consider aligning the dataframe-level skip logic with the row-level guard so the transform behaves consistently.

Copilot · 2026-03-09T20:21:24Z

src/inference_endpoint/dataset_manager/transforms.py

        Returns:
            Row dictionary with the harmonized prompt added
        """
-        row[self.tokenized_column] = self.harmonizer(row[self.prompt_column])
+        # Guard pre-tokenized rows: the SGLang adapter adds a default Harmonize
+        # (GPT-OSS tokenizer + harmony mode). When row processors are fused, the
+        # dataframe-level skip is bypassed, so without this guard, adapter
+        # Harmonize would overwrite input tokens. Alternative: remove Harmonize
+        # from the adapter transforms and require each SGLang preset to add its
+        # own Harmonize with the desired tokenizer/args.
+        if self.tokenized_column in row and row[self.tokenized_column] is not None:
+            return row
+        if self.mode == "plain":
+            tokens = self.harmonizer.to_tokens(row[self.prompt_column])
+            row[self.tokenized_column] = tokens
+        else:
+            row[self.tokenized_column] = self.harmonizer(row[self.prompt_column])


This change adds new Harmonize behavior (mode plus the overwrite-prevention guard when row processors are fused), but tests/unit/dataset_manager/test_transforms.py explicitly excludes Harmonize. Please add unit tests that cover (1) mode="plain" vs mode="harmony", and (2) fused pipelines where a second Harmonize should not overwrite existing input_tokens.

Copilot · 2026-03-09T20:21:24Z

src/inference_endpoint/openai/types.py

@@ -109,5 +109,5 @@ class ChatCompletionResponse(msgspec.Struct, kw_only=True, omit_defaults=True):
    created: int
    model: str
    choices: list[ChatCompletionChoice]
-    usage: CompletionUsage | None
-    system_fingerprint: str | None
+    usage: CompletionUsage | None = None
+    system_fingerprint: str | None = None


There are no unit tests covering the msgspec OpenAI types / msgspec adapter decode path. Since these fields now default to None to support responses that omit them, it would be good to add a test that decodes a minimal OpenAI-compatible response missing refusal, usage, and system_fingerprint and asserts decoding succeeds and fields are None.

Copilot · 2026-03-09T20:21:24Z

src/inference_endpoint/dataset_manager/transforms.py

        self.harmonizer = Harmonizer(
            tokenizer_name=tokenizer_name,
            encoding_name=encoding_name,


In mode="plain", process_row only calls self.harmonizer.to_tokens(...), but Harmonizer.__init__ still loads the Harmony encoding and constructs Harmony system content. That’s potentially expensive and unnecessary for plain tokenization. Consider a lightweight path for plain mode (e.g., defer encoding load until __call__ is used, or use the underlying tokenizer directly) to reduce init overhead.

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

* Handle case with string response Handles the case where the response is a single string, not a list - needed to handle AMD submission which wasn't calculating TPOT without the fix. --------- Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

attafosu added 2 commits March 4, 2026 11:35

Extend openaimsgspec to bridge sglang endpoint compatibility

f79e2db

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

Add sglang-specific preset for llama3.1-8b

d909fac

Signed-off-by: attafosu <thomas.atta-fosu@intel.com> Committer: attafosu <thomas.atta-fosu@intel.com>

attafosu requested a review from a team as a code owner March 9, 2026 20:15

Copilot AI review requested due to automatic review settings March 9, 2026 20:15

github-actions bot requested review from arekay-nv and nvzhihanj March 9, 2026 20:15

Copilot started reviewing on behalf of attafosu March 9, 2026 20:16 View session

gemini-code-assist bot reviewed Mar 9, 2026

View reviewed changes

Copilot AI reviewed Mar 9, 2026

View reviewed changes

attafosu and others added 3 commits March 9, 2026 13:42

Add sglang-specific preset for llama3.1-8b

2c32417

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

fix tcp warmup (#153)

b04d7d3

attafosu force-pushed the feat/attafosu/sglang-openai-api-compatibility branch from ff66399 to 4dd91ee Compare March 9, 2026 20:52

attafosu closed this Mar 9, 2026

github-actions bot locked and limited conversation to collaborators Mar 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/attafosu/sglang OpenAI api compatibility#156

Feat/attafosu/sglang OpenAI api compatibility#156
attafosu wants to merge 5 commits intomainfrom
feat/attafosu/sglang-openai-api-compatibility

attafosu commented Mar 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		if self.tokenized_column in row and row[self.tokenized_column] is not None:
		return row

Conversation

attafosu commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

attafosu commented Mar 9, 2026 •

edited

Loading

github-actions bot commented Mar 9, 2026 •

edited

Loading